Conversation
|
Review requested:
|
|
The
notable-change
Please suggest a text for the release notes if you'd like to include a more detailed summary, then proceed to update the PR description with the text or a link to the notable change suggested text comment. Otherwise, the commit will be placed in the Other Notable Changes section. |
|
Nice! This is a great addition. Since it's such a large PR, this will take me some time to review. Will try to tackle it over the next week. |
| */ | ||
| existsSync(path) { | ||
| // Prepend prefix to path for VFS lookup | ||
| const fullPath = this.#prefix + (StringPrototypeStartsWith(path, '/') ? path : '/' + path); |
| validateObject(files, 'options.files'); | ||
| } | ||
|
|
||
| const { VirtualFileSystem } = require('internal/vfs/virtual_fs'); |
There was a problem hiding this comment.
Shouldn't we import this at the top level / lazy load it at the top level?
| ArrayPrototypePush(this.#mocks, { | ||
| __proto__: null, | ||
| ctx, | ||
| restore: restoreFS, |
There was a problem hiding this comment.
| restore: restoreFS, | |
| restore: ctx.restore, |
nit
lib/internal/vfs/entries.js
Outdated
| * @param {object} [options] Optional configuration | ||
| */ | ||
| addFile(name, content, options) { | ||
| const path = this._directory.path + '/' + name; |
lib/internal/vfs/virtual_fs.js
Outdated
| let entry = current.getEntry(segment); | ||
| if (!entry) { | ||
| // Auto-create parent directory | ||
| const dirPath = '/' + segments.slice(0, i + 1).join('/'); |
lib/internal/vfs/virtual_fs.js
Outdated
| let entry = current.getEntry(segment); | ||
| if (!entry) { | ||
| // Auto-create parent directory | ||
| const parentPath = '/' + segments.slice(0, i + 1).join('/'); |
lib/internal/vfs/virtual_fs.js
Outdated
| } | ||
| } | ||
| callback(null, content); | ||
| }).catch((err) => { |
There was a problem hiding this comment.
| }).catch((err) => { | |
| }, (err) => { |
lib/internal/vfs/virtual_fs.js
Outdated
| const bytesToRead = Math.min(length, available); | ||
| content.copy(buffer, offset, readPos, readPos + bytesToRead); |
lib/internal/vfs/virtual_fs.js
Outdated
| } | ||
|
|
||
| callback(null, bytesToRead, buffer); | ||
| }).catch((err) => { |
There was a problem hiding this comment.
| }).catch((err) => { | |
| }, (err) => { |
|
Left an initial review, but like @Ethan-Arrowood said, it'll take time for a more in depth look |
|
It's nice to see some momentum in this area, though from a first glance it seems the design has largely overlooked the feedback from real world use cases collected 4 years ago: https://github.com/nodejs/single-executable/blob/main/docs/virtual-file-system-requirements.md - I think it's worth checking that the API satisfies the constraints that users of this feature have provided, to not waste the work that have been done by prior contributors to gather them, or having to reinvent it later (possibly in a breaking manner) to satisfy these requirements from real world use cases. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #61478 +/- ##
==========================================
+ Coverage 89.68% 89.82% +0.13%
==========================================
Files 676 692 +16
Lines 206555 215687 +9132
Branches 39552 41259 +1707
==========================================
+ Hits 185249 193739 +8490
- Misses 13444 14061 +617
- Partials 7862 7887 +25
🚀 New features to boost your workflow:
|
|
And why not something like OPFS aka whatwg/fs? const rootHandle = await navigator.storage.getDirectory()
await rootHandle.getFileHandle('config.json', { create: true })
fs.mount('/app', rootHandle) // to make it work with fs
fs.readFileSync('/app/config.json')OR const rootHandle = await navigator.storage.getDirectory()
await rootHandle.getFileHandle('config.json', { create: true })
fs.readFileSync('sandbox:/config.json')fs.createVirtual seems like something like a competing specification |
5e317de to
977cc3d
Compare
I generally prefer not to interleave with WHATWG specs as much as possible for core functionality (e.g., SEA). In my experience, they tend to perform poorly on our codebase and remove a few degrees of flexibility. (I also don't find much fun in working on them, and I'm way less interested in contributing to that.) On an implementation side, the core functionality of this feature will be identical (technically, it's missing writes that OPFS supports), as we would need to impact all our internal fs methods anyway. If this lands, we can certainly iterate on a WHATWG-compatible API for this, but I would not add this to this PR. |
|
Small prior art: https://github.com/juliangruber/subfs |
8d711c1 to
73c18cd
Compare
|
I also worked on this a bit on the side recently: Qard@73b8fc6 That is very much in chaotic ideation stage with a bunch of LLM assistance to try some different ideas, but the broader concept I was aiming for was to have a module.exports = new VirtualFileSystem(new LocalProvider())I intended for it to be extensible for a bunch of different interesting scenarios, so there's also an S3 provider and a zip file provider there, mainly just to validate that the model can be applied to other varieties of storage systems effectively. Keep in mind, like I said, the current state is very much just ideation in a branch I pushed up just now to share, but I think there are concepts for extensibility in there that we could consider to enable a whole ecosystem of flexible storage providers. 🙂 Personally, I would hope for something which could provide both read and write access through an abstraction with swappable backends of some variety, this way we could pass around these virtualized file systems like objects and let an ecosystem grow around accepting any generalized virtual file system for its storage backing. I think it'd be very nice for a lot of use cases like file uploads or archive management to be able to just treat them like any other readable and writable file system. |
just a bit off topic... but this reminds me of why i created this feature request: Would not lie, it would be cool if NodeJS also provided some type of static example that would only work in NodeJS (based on how it works internally) const size = 26
const blobPart = BlobFrom({
size,
stream (start, end) {
// can either be sync or async (that resolves to a ReadableStream)
// return new Response('abcdefghijklmnopqrstuvwxyz'.slice(start, end)).body
// return new Blob(['abcdefghijklmnopqrstuvwxyz'.slice(start, end)]).stream()
return fetch('https://httpbin.dev/range/' + size, {
headers: {
range: `bytes=${start}-${end - 1}`
}
}).then(r => r.body)
}
})
blobPart.text().then(text => {
console.log('a-z', text)
})
blobPart.slice(-3).text().then(text => {
console.log('x-z', text)
})
const a = blobPart.slice(0, 6)
a.text().then(text => {
console.log('a-f', text)
})
const b = a.slice(2, 4)
b.text().then(text => {
console.log('c-d', text)
})An actual working PoC(I would not rely on this unless it became officially supported by nodejs core - this is a hack) const blob = new Blob()
const symbols = Object.getOwnPropertySymbols(blob)
const blobSymbol = symbols.map(s => [s.description, s])
const symbolMap = Object.fromEntries(blobSymbol)
const {
kHandle,
kLength,
} = symbolMap
function BlobFrom ({ size, stream }) {
const blob = new Blob()
if (size === 0) return blob
blob[kLength] = size
blob[kHandle] = {
span: [0, size],
getReader () {
const [start, end] = this.span
if (start === end) {
return { pull: cb => cb(0) }
}
let reader
return {
async pull (cb) {
reader ??= (await stream(start, end)).getReader()
const {done, value} = await reader.read()
cb(done ^ 1, value)
}
}
},
slice (start, end) {
const [baseStart] = this.span
return {
span: [baseStart + start, baseStart + end],
getReader: this.getReader,
slice: this.slice,
}
}
}
return blob
}currently problematic to do: also need to handle properly clone, serialize & deserialize, if this where to be sent of to another worker - then i would transfer a MessageChannel where the worker thread asks main frame to hand back a transferable ReadableStream when it needs to read something. but there are probably better ways to handle this internally in core with piping data directly to and from different destinations without having to touch the js runtime? - if only getReader could return the reader directly instead of needing to read from the ReadableStream using js? |
|
Should this live behind a CLI flag initially? It might be better to work through #62328 while the API is not exposed by default. |
Honestly... wouldn't hurt. |
The biggest risk of this PR is the patches on the modules' loading side and the fs side. You cannot put them behind a flag. The rest is just a new API with bugs (like many others). |
Add 18 feature-focused test files covering bug fixes from both analysis rounds, and fix lint issues in stats.js and setup.js. Test files added: - test-vfs-access.js: access validation and mode enforcement - test-vfs-buffer-encoding.js: buffer encodings and buffer path args - test-vfs-copyfile-mode.js: copyFile COPYFILE_EXCL support - test-vfs-dir-handle.js: Dir double-close and read/close callbacks - test-vfs-file-url.js: file: URL handling with fileURLToPath - test-vfs-mkdir-recursive-return.js: mkdirSync recursive return value - test-vfs-no-auto-mkdir.js: writes/opens don't auto-create parents - test-vfs-open-flags.js: read-only/write-only/exclusive/numeric flags - test-vfs-readdir-recursive.js: readdir recursive with names and dirents - test-vfs-readfile-fd.js: readFileSync with virtual fd - test-vfs-rename-safety.js: renameSync preserves source on failure - test-vfs-rm-edge-cases.js: rmSync dir/link edge cases - test-vfs-stats-bigint.js: BigInt stats support - test-vfs-stream-options.js: writestream start and stream fd option - test-vfs-symlink-edge-cases.js: broken/intermediate symlinks - test-vfs-watch-directory.js: directory and recursive watch - test-vfs-watchfile.js: unwatchFile cleanup and zero stats - test-vfs-writefile-flags.js: writeFile/appendFile flag support Lint fixes: - stats.js: use BigInt from primordials, add missing JSDoc @returns - setup.js: consolidate ERR_MODULE_NOT_FOUND import
|
Could those patches land separately from the flaggable part? That way it’d be much easier to ensure no regressions from either portion. (i ofc wouldn’t suggest splitting the PR until after it was otherwise philosophically landable) |
|
@ljharb maybe. But we would be landing code that is not tested, so we’d be in a worse stance? Note that experimental features are covered by our threat model. |
|
Surely either the patches are meant to be noops in isolation (in which case existing tests should be sufficient - or improved - to assure no breakage) or have independent functionality (which could ship with its own tests)? |
|
I don’t understand why. Ease of review? |
|
Ease of review, and if there’s any risk in patching core subsystems, that PR could also be released as a patch and then that assumption would be tested, and the VFS functionality could land separately as a minor. Certainly not required, but it might be prudent for assuring stability. |
- Check AbortSignal before VFS fast path in readFile/writeFile/appendFile - Honor options.flag in readFile/readFileSync on VFS paths - Route realpath.native/realpathSync.native through VFS handlers - Implement chmod/chown/utimes/lutimes in MemoryProvider instead of no-ops - Pass bigint option through VFSStatWatcher to statSync/createZeroStats - Coerce BigInt positions to Number in MemoryFileHandle read/write - Track nlink on MemoryEntry, increment on link, decrement on unlink - Update ctime alongside mtime on content mutations (write/truncate)
- Throw ERR_INVALID_PACKAGE_CONFIG for malformed package.json in VFS
instead of silently falling through to index.js resolution
- Move writeFile/appendFile options validation (getOptions, parseFileMode,
validateBoolean) before VFS fast path so invalid options are rejected
- Validate flags with stringToFlags() before VFS check in open/openSync
so invalid flag values like {} throw ERR_INVALID_ARG_VALUE
- Fix rmdirSync to not follow symlinks (use getEntry with false) so
symlinks to directories correctly throw ENOTDIR
- Update parent directory mtime/ctime when children are added or removed
in openSync, mkdirSync, rmdirSync, unlinkSync, linkSync, symlinkSync,
and renameSync
- Pass bigint option through to VFS statfs handlers and return BigInt
values when options.bigint is true
Convert VFS callback and promise code paths from calling sync handler methods to async handlers that return undefined (not handled) or a Promise (VFS handles it). This removes sync method calls from async code paths, making VFS safe for custom providers that do real I/O. Add DRY utilities (vfsRead, vfsOp, vfsOpVoid in setup.js and vfsResult, vfsVoid in fs.js) to eliminate repeated boilerplate across ~80 handler methods and ~40 fs functions. Add missing promises methods (chmod, chown, lchown, utimes, lutimes, open, lchmod) to file_system.js. Fix bugs where chown/lchown/lutimes async handlers called wrong sync methods. Fix invalid package.json handling to gracefully fall through in CJS context instead of throwing.
|
I’m moving this back to draft. I’m doing some refactoring and simplifying some paths for ease of review. I’ll add a review guide in the PR description and an architecture diagram explaining how all pieces fit together. |
Convert FD-based callback functions (close, read, write, fstat, ftruncate, fdatasync, fsync, fchmod, fchown, futimes, readv, writev) from sync handler + process.nextTick to async handlers using the undefined | Promise pattern, matching the approach already used for path-based operations. Add async FD handlers to setup.js that call the async methods on MemoryFileHandle (read, write, stat, truncate, close) instead of their sync counterparts, avoiding event loop blocking for custom VFS providers that do real I/O. Fix vfs.md documentation that was significantly out of date: - Remove false claim that chmod, chown, truncate, utimes, link, fdatasync, fsync have no VFS equivalent (all are implemented) - Add missing intercepted methods to the fs integration section (truncate, link, chmod, chown, lchown, utimes, lutimes, mkdtemp, lchmod, cp, statfs, opendir, readv, writev, ftruncate, fchmod, fchown, futimes, fdatasync, fsync) - Shrink "not intercepted" list to just glob/globSync - Add missing provider.supportsWatch documentation - Update overlay mode operation routing lists
Security: - Reject overlapping VFS mounts with clear error - Add allowWithPermissionModel opt-in for permission model - RealFSProvider: validate symlink targets, throw EACCES on escape API compatibility: - VirtualFileHandle: add no-op stubs and Symbol.asyncDispose - Intercept fsPromises.open() for VFS paths - Streams: add bytesRead/pending and bytesWritten/pending - VirtualDir: add Symbol.asyncDispose/Symbol.dispose - Stats: use dev=4085 and incrementing ino - Fix VFSStatWatcher listener signature Correctness: - Fix checkClosed to accept syscall parameter - Fix writeFileSync to use isAppend() for ax/ax+ flags - Fix dead ternary in hookProcessCwd - SEAProvider: cache asset sizes, recursive readdir, numeric flags - Streams: use inherited destroyed, EBADF on null fd Architecture: - Cross-VFS rename/copyFile/link throw EXDEV - Clear package.json caches on unmount - Convert readFile async handler to use promises Mock integration: - Make MockFSContext.vfs a private field with getter - Fix parentDir root check for Windows portability - restoreAll() collects errors into AggregateError - Remove obsolete skipped dynamic content test Provider fixes: - MemoryProvider: statSync uses contentProvider for dynamic size - MemoryProvider: recursive readdir follows symlinks - RealFSProvider: add watch/watchFile/unwatchFile support Code quality: - Cap watcher pending events queue at 1024 - Remove redundant destroyed field from streams
Replace the monolithic test-vfs-followups.js with individual test files: - test-vfs-overlapping-mounts.js - test-vfs-promises-open.js - test-vfs-stream-properties.js - test-vfs-dir-disposal.js - test-vfs-stats-ino-dev.js - test-vfs-append-write.js - test-vfs-cross-device.js - test-vfs-readdir-symlink-recursive.js - test-vfs-package-json-cache.js - test-vfs-readfile-async.js
|
Have we acknowledged and documented anywhere the limitations of a Node.js-specific VFS? For example, any native addon code, any child processes (written in either Node.js or other languages and runtimes) won't see the same view of the VFS. Which is why by default I advocate for solving these types of problems at the platform layer, i.e. Linux (including "SEA", to be honest). |
This is one of the reasons why I was suggesting that rather than this global |
Which is why such an hypothetical API wouldn't be used imo.
By contrast the transparent vfs has been proven to work in production for the past 6+ years (through Electron first, then more generally by Yarn PnP). |
This comment was marked as off-topic.
This comment was marked as off-topic.
There's not actually that many packages which actually interact with the file system. I don't think it would actually be as hard as you suggest to guide the ecosystem to adopting a cleaner pattern. To me, giving in to a bad pattern just because it's what the ecosystem currently does is lazy. We do in fact have the ability to drive adoption of new patterns and systems, as evidenced by projects adopting the built-in test framework which was added long after other options had already "won" that use case.
Adoption of ESM itself lagged significantly for a long time. That in combination with Loaders being a moving target for a very long time further lended to that lack of adoption. Also, it's fine if not everything supports it. I don't think it's wise to be potentially weakening our security model and increasing the surface area of things which users could break unintentionally in order to extend such functionality to code which was never intended to have such capability in the first place. In my opinion these things should be explicit. Too much magic is harmful. I recognize that not everyone agrees with me on this though, so I won't block on that point. But I still maintain that I dislike the design decision. |
Which is my point: leaving it up to the ecosystem to adapt for Node.js APIs because we couldn't make a safe design would be lazy. And a bad pattern. Using tests isn't a good analogy because not only have they marginal use in the wild compared to Jest / Vitest (due to being a recent feature, I'll give you that, although I think it supports what I was saying), it's also not a foundational feature. It's used at the app level, never in transitive packages. You simply don't have this poisonous pattern I raised earlier.
How is that weakening it? It's very vague. |
But this isn't a safe design, so what we're providing here is a bad pattern. The non-lazy thing to do would be to provide something with better defined boundaries and then put in the work to advocate for the ecosystem to adopt it. Taking the route of just trying to make a thing which fits into how things work presently is the lazy approach.
I've already gone over it in detail earlier in the thread. Any dependency anywhere can add custom mounts which could hijack any fs access anywhere in the entire thread. It could easily hijack things like reading certs, reading config files, etc. Malicious dependencies could easily hijack file access for nefarious purposes. |
I saw that, but I also saw mentioned that the Node.js threat model explicitly permits that:
|
|
Just because our threat model allows it doesn't mean it does not have a measurable impact on security. It just means we don't consider it important enough or feasible to address. But it is definitely a thing which can and probably will be attacked via malicious dependencies. |
|
I'd agree with @jasnell re. security. It's not in our threat model and it would be worse to create the false belief that there's any assurance for them to not be hijackable, because people have already been hijacking it and it's part of the hidden contract that's widely relied upon by the ecosystem that we cannot break. It's better to have a clear contract than not having contract at all and encourage people to invent their own way of hijacking it. For reference, see #62012 - everytime we subtly change any internal module loading order such that fs methods are cached internally, we can break popular ecosystem tools like Yarn PnP and a bunch others. Having a clear contract for vfs hooks means they can move on to use that clear contract and we no longer have to worry about these when touching internals. Not having a clear contract just means people will find their own way to hijack it that creates maintenance burden for us due to Hyrum's law.
I do not think that's so easy without explicit integration. Take reading certs for example - we don't read certs through fs, this PR (at least since my lasts review) doesn't force a C++ -> JS callback to go through VFS providers (this would be visible in code, and TBH difficult to get right even if you want to do it because the certificate initialization code can happen before calling into JS is possible, and it can even go off thread), so I don't think it'd allow that unless it's explicitly implemented. We have the same thing for |
ThanhDodeurOdoo
left a comment
There was a problem hiding this comment.
I'm not very familiar with node's internals but here are a few things that seem like potential issues
| if (r !== null) { | ||
| const fd = r.vfs.openSync(r.normalized, flags, mode); | ||
| const vfd = getVirtualFd(fd); | ||
| return PromiseResolve(vfd.entry); |
There was a problem hiding this comment.
wouldn't that make fs.open() return a virtual file handle object instead of a fd (number)?
edit: and fs.promises.open() returns a virtual handle instead of a FileHandle
| this[kOriginalChdir] = process.chdir; | ||
| this[kOriginalCwd] = process.cwd; |
There was a problem hiding this comment.
if we make multiple vfs with virtualCwd: true that can behave badly (like restauring stale refs), could it be explicitly disallowed?
|
|
||
| watchFile(vfsPath, options) { | ||
| const realPath = this.#resolvePath(vfsPath); | ||
| return fs.watchFile(realPath, options, () => {}); |
There was a problem hiding this comment.
VirtualFileSystem.watchFile() forwards the listener but here it's just dropped
A first-class virtual file system module (
node:vfs) with a provider-based architecture that integrates with Node.js's fs module and module loader.Key Features
Provider Architecture - Extensible design with pluggable providers:
MemoryProvider- In-memory file system with full read/write supportSEAProvider- Read-only access to Single Executable Application assetsVirtualProvider- Base class for creating custom providersStandard fs API - Uses familiar
writeFileSync,readFileSync,mkdirSyncinstead of custom methodsMount Mode - VFS mounts at a specific path prefix (e.g.,
/virtual), clear separation from real filesystemModule Loading -
require()andimportwork seamlessly from virtual filesSEA Integration - Assets automatically mounted at
/seawhen running as a Single Executable ApplicationFull fs Support - readFile, stat, readdir, exists, streams, promises, glob, symlinks
Example
SEA Usage
When running as a Single Executable Application, bundled assets are automatically available:
Public API
Disclaimer: I've used a significant amount of Claude Code tokens to create this PR. I've reviewed all changes myself.
Fixes #60021